[RFC] Engine Refactor Proposal #3752

ashwinvaidya17 · 2024-07-19T10:12:46Z

TL;DR

This PoC aims to propose and discuss refactoring of the current OTX Engine. It will help make the task more flexible, reduce code duplication, and help directly expose all features of the upstream framework. The code changes here are at a very high level.

Intro

This is a very limited PoC (think of it like pseudo-code) to get the discussions rolling. It is not as extensive in features as what Harim had made last year (https://github.com/harimkang/training_extensions/tree/harimkan/feature-v2-ultralytics/src/otx/v2/adapters/torch/ultralytics)
The main reason is that Anomalib Engine does a lot of things under-the-hood which would require major effort when implementing in OTX. I'll share the presentation in Teams. It would be nice to get feedback on this to flesh it out further. Also, we can schedule 1:1s to discuss any concerns and modify accordingly.

Motivation

Currently, the Anomaly task is limited, and impacted by the requirement of moulding it to match OTX Model, Engine, and DataLoader. This involves duplication of code, re-implementing features in Anomalib, and complex inheritance hierarchy. This makes it harder for maintenance, and can lead to divergence of the same features across repos.
Anomalib Engine also internally takes care of attaching the right post/pre-processing callbacks, modifying the training loop based on the learning type, setting up model-specific transforms, and update dataloaders based on the task type. Supporting these in OTX without Anomalib's Engine will require a lot of effort and some complex logic.
From Geti and OTX perspective, the only output artefacts we care about are the metrics produced, input dataset, and output annotations. So, this means that till the time we have the right converters, internally, the tasks have more flexibility on the representation they use for models and dataset. And, if we directly pass the trained model weights to the upstream framework, and produce a standard result, we don't need the models to conform to a specific format.

Approach

The Engine can remain as-is and will work like AutoEngine as proposed by Harim in his initial design. see: engine_poc.py
The contents of the current engine can be extracted to a separate adapter that trains the current OTX model.
Another adapter for Anomalib can be introduced along with the AnomalibEngine to directly use the Anomalib's own engine for training.
Add datumaro adapter in Anomalib so it can ingest dataset from OTX and Geti.
Anomalib, OTX, and modelAPI will require updating so that the OpenVINO models produced by both the frameworks are the same, and can use the same inferencers.

Outcome

This will make it easier to expose all the models, and features of Anomalib such as Zero-shot, Few-shot, and LLM based models, as well as synthetic data generation.
Since, the underlying models will remain same for Anomalib and OTX, models trained on one framework will be interchangeable with the other.
Further, we can unify the OpenVINO models exported from Anomalib and OTX so that they are interchangeable.

API Example

Single Engine as entrypoint for all backends.

Allows us to not only directly use all the models of the other framework but also allows us to use its dataloaders.

from otx.engine.engine_poc import Engine
from anomalib.data import MVTec
from anomalib.models import Padim

engine = Engine()
model = Padim()
datamodule = MVTec(root="./datasets/MVTec")
engine.train(model, datamodule)

Ultralytics example just to show that we can extend it to other frameworks.

model = YOLO("yolov8n.pt")
engine = Engine()
dataset = ClassificationDataset(root="./datasets/hazelnut_toy", args=DEFAULT_CFG)
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)
engine.train(model=model, datamodule=dataloader)

Current OTX models are compatible and use the same entry-points.

from otx.algo.classification.mobilenet_v3 import MobileNetV3ForMulticlassCls
from otx.core.config.data import SubsetConfig
from otx.core.data import OTXDataModule

datamodule = OTXDataModule(
	task=OTXTaskType.MULTI_CLASS_CLS,
	data_format="cifar",
	data_root="./datasets/cifar/cifar-10-batches-py",
	train_subset=SubsetConfig(
		batch_size=64,
		subset_name="data_batch_1",
		transform_lib_type="TORCHVISION",
		transforms=simple_transforms,
	),
	val_subset=SubsetConfig(
		batch_size=64,
		subset_name="test_batch",
		transform_lib_type="TORCHVISION",
		transforms=simple_transforms,
	),
	test_subset=SubsetConfig(
		batch_size=64,
		subset_name="test_batch",
		transform_lib_type="TORCHVISION",
		transforms=simple_transforms,
	),
)
model = MobileNetV3ForMulticlassCls(
	label_info=datamodule.label_info,
)
engine = Engine()
engine.train(model, datamodule, max_epochs=1, deterministic=False, val_check_interval=0.5)

CLI Registration

If the Engine class is decorated correctly to accept models from different frameworks, we can directly expose these models to the cli

class Engine:
	...
	def train(
		self,
		model: OTXModel | AnomalyModule | UltralyticsModel,
		...
	) -> METRICS

class CLI:
	def add_subcommands(self)->None:
		...
		subparser.add_method_arguments(Engine, "train")
		...

Models from the upstream repos are automatically exposed, and can work with OTX

python cli_poc.py train --help

...
Train the model:
  --model.help CLASS_PATH_OR_NAME
                        Show the help for the given subclass of {OTXModel,AnomalyModule,Model} and exit.
  --model MODEL         (required, type: OTXModel | AnomalyModule | Model, known subclasses: otx.core.model.base.OTXModel, otx.core.model.base.OVModel, anomalib.models.Cfa, anomalib.models.Cflow, anomalib.models.Csflow, anomalib.models.Dfkde,
                        anomalib.models.Dfm, anomalib.models.Draem, anomalib.models.Dsr, anomalib.models.EfficientAd, anomalib.models.Fastflow, anomalib.models.Fre, anomalib.models.Ganomaly, anomalib.models.Padim, anomalib.models.Patchcore,
                        anomalib.models.ReverseDistillation, anomalib.models.Rkde, anomalib.models.Stfpm, anomalib.models.Uflow, anomalib.models.WinClip, anomalib.models.AiVad, ultralytics.engine.model.Model, ultralytics.YOLO,
                        ultralytics.YOLOWorld, ultralytics.FastSAM, ultralytics.NAS, ultralytics.RTDETR, ultralytics.SAM)
...

Signed-off-by: Ashwin Vaidya <[email protected]>

harimkang

Thank you for pulling back my initial design and reviewing it. I believe there are pros and cons to both the current and past designs, and while the this PR's design maximizes scalability, but This looks quite different from Engine's current policy. There still seems to be a lot to discuss.

AutoEngine is not the right name in my opinion, I think it would be better to keep Engine but define each Backend's SubEngine as **Engine.

harimkang · 2024-07-22T00:30:28Z

src/otx/cli/cli_poc.py

In my opinion, the main focus of this PR is refactoring for Engine-related interfaces and scalability.
Currently, our CLI already has the functionality of this code, so if this PoC is going to merge into develop, we need to look at compatibility with our existing CLI code.

I think the following discussion should be made to the existing CLI code with this changes.

We need to make sure that our configuration allows us to receive DataModule other than OTXDataModule. -> I think this is possible with some tweaks to add_argument.

We need to think about how to build Dataset from external frameworks other than DataModule in the CLI.

Anyway, for the CLI part, I think it's better to wait until Engine's design is finalized and see if there are any issues.

Yeah, the main intention of this PR is to propose the change to the Engine class. I included CLI for completeness, and also to ensure that the new design does not break it.

harimkang · 2024-07-22T00:32:17Z

src/otx/engine/anomalib.py

+from otx.engine.base import METRICS, Adapter
+
+
+def wrap_to_anomalib_datamodule(datamodule: OTXDataModule) -> AnomalibDataModule:


Do we really need this?
Can't we just use anomalib.data.Folder?
(like https://github.com/openvinotoolkit/anomalib/blob/main/configs/data/folder.yaml)

harimkang · 2024-07-22T00:34:52Z

src/otx/engine/anomalib.py

+
+class AnomalibAdapter(Adapter):
+    def __init__(self):
+        self._engine = AnomalibEngine()


It feels like the Engine is holding the Adapter and the Adapter is holding the Engine in turn. Even if they have the same name, wouldn't it be better to use a different name to distinguish them (e.g. trainer)?

Yeah, the naming is super tricky here. The problem is that internally, Anomalib Engine also has a trainer. So currently, it is Engine->Adapter->Engine (anomalib)->Trainer. Which isn't the best. I am open to alternatives.

harimkang · 2024-07-22T00:38:26Z

src/otx/engine/engine_poc.py

+    def train(
+        self,
+        model: OTXModel | AnomalyModule | Model,
+        datamodule: OTXDataModule | AnomalibDataModule | UltralyticsDataset,
+        **kwargs,
+    ) -> METRICS:


Currently, the arguments for each function in Engine are very well thought out (per each commands), so I think we should try to keep the existing arguments as much as possible.
the guiding principle of the OTX Engine is 1 Model = 1 Engine. -> I don't currently see any reason to break this principle.
Therefore, I think it is correct to go to __init__ to receive the model.

def train( self, max_epochs: int = 10, seed: int | None = None, deterministic: bool | Literal["warn"] = False, precision: _PRECISION_INPUT | None = "32", callbacks: list[Callback] | Callback | None = None, logger: Logger | Iterable[Logger] | bool | None = None, resume: bool = False, metric: MetricCallable | None = None, run_hpo: bool = False, hpo_config: HpoConfig = HpoConfig(), # noqa: B008 https://github.com/omni-us/jsonargparse/issues/423 checkpoint: PathLike | None = None, adaptive_bs: Literal["None", "Safe", "Full"] = "None", **kwargs, ) -> dict[str, Any]:

I like the current way of doing things better, where we clearly define what we need for each command as an argument to each function, as shown above. Also, I don't see any reason to change the model through the function.

Current OTX: Engine is created (Model is determined) -> E2E flow is used as this model.
This PR's Design: There's room for the Model to change with every command. -> Do we need this use-case?
I believe the above discussion was discussed and decided upon in the first Engine design proposal, and I don't see any reason to change it at this time.

Yeah I prefer explicit arguments as well. I didn't include all of them here as I wanted to focus on the adapter. I think we should have explicit arguments also so that these will be serialised by the CLI when creating the YAML file. Maybe this will get polished after a few iterations of discussions.

Yes, I see, and the reason I commented was that I thought it would be a good idea to avoid breaking existing Engine Usage as much as possible.

harimkang · 2024-07-22T00:41:43Z

src/otx/engine/engine_poc.py

+    def adapter(self, adapter: adapter) -> None:
+        self._adapter = adapter
+
+    def get_adapter(self, model: OTXModel | AnomalyModule | Model) -> adapter:


It's weird to separate adapters into models. I think it would be better to be able to distinguish between adapters and models and have them be received as Engine's __init__, but what do you think?
-> I believe that Adapters and Models should be somewhat independent.

def __init__( self, ... backends = Literal["base", "anomalib", "..."] = "base", ... ) -> None:

harimkang · 2024-07-22T00:43:36Z

src/otx/engine/engine_poc.py

+logger = logging.getLogger(__name__)
+
+
+class Engine:


I believe that the current structure is an implementation that doesn't take auto-configuration into account at all. Is auto-configuration considered in the design?
In my opinion, Auto-Configuration is one of OTX Key-Features.

harimkang · 2024-07-22T00:45:24Z

src/otx/engine/ultralytics/trainer.py

+)
+
+
+class UltralyticsTrainer(BaseTrainer):


Why is this class separate from Adapter?

harimkang

To avoid any mistakes, I'm going to Request Changes to this PR first.

ashwinvaidya17 · 2024-07-22T13:59:09Z

@harimkang an alternative to Adapter can be some sort of factory class. Since we pass model to the __init__ method. Here is a different proposal

from abc import ABC, abstractmethod
from typing import Any

from anomalib.models import AnomalyModule
from lightning.pytorch import LightningModule


class BaseEngine(ABC):
    @abstractmethod
    def train(self, **kwargs):
        pass

    @classmethod
    @abstractmethod
    def is_valid_model(cls, model: Any) -> bool:
        pass


class EngineRegistry:
    engines = set()

    @classmethod
    def register(cls):
        def decorator(engine_class: BaseEngine):
            cls.engines.add(engine_class)
            return engine_class

        return decorator


@EngineRegistry.register()
class AnomalibEngine(BaseEngine):
    def __init__(self, model: AnomalyModule, **kwargs):
        print("Anomalib Engine")

    def train(self, **kwargs):
        pass

    @classmethod
    def is_valid_model(cls, model: Any) -> bool:
        return isinstance(model, AnomalyModule)


@EngineRegistry.register()
class LightningEngine(BaseEngine):
    def __init__(self, model: LightningModule, **kwargs):
        print("Lightning Engine")

    def train(self, **kwargs):
        pass

    @classmethod
    def is_valid_model(cls, model: Any) -> bool:
        return isinstance(model, LightningModule)

Then we can initialise the right engine something like this. Here Engine serves as AutoEngine but acts like a factory class.

class Engine:
    def __new__(
        cls, model: Any, **kwargs
    ) -> BaseEngine:  # pass all init parameters to this
        for engine in EngineRegistry.engines:
            if engine.is_valid_model(model):
                return engine(model, **kwargs)

And to test

class MockAnomalyModule(AnomalyModule):
    def __init__(self):
        pass

    def learning_type(self):
        pass

    def trainer_arguments(self):
        pass


class MockLightningModule(LightningModule):
    def __init__(self):
        pass


if __name__ == "__main__":
    model = MockAnomalyModule()
    engine = Engine(model)
    assert isinstance(engine, AnomalibEngine)

    model = MockLightningModule()
    engine = Engine(model)
    assert isinstance(engine, LightningEngine)

wonjuleee

Is there no way to use https://github.com/openvinotoolkit/anomalib/blob/main/src/anomalib/models/image/padim/torch_model.py instead of lightning model?

wonjuleee

Could you clarify what components are included in anomalib trainer so as to customize OTX engine?

ashwinvaidya17 · 2024-07-23T11:52:11Z

@wonjuleee good question. The separation of torch model in Anomalib was something we had designed initially to make the model export easier. However, over time the export mechanism has changed in Anomalib, and we are looking into embedding the post-processing in the exported model. Additionally, the Anomalib lightning model stores certain metadata that is necessary during different stages of the pipeline.

https://github.com/openvinotoolkit/anomalib/blob/d1f824a5798262891dbbe583fb291e1cf9aa7d2a/src/anomalib/models/components/base/anomaly_module.py#L49-57

These itself are configured from callbacks in the engine. The callbacks are hidden here so that the users do not modify them inadvertently.
https://github.com/openvinotoolkit/anomalib/blob/d1f824a5798262891dbbe583fb291e1cf9aa7d2a/src/anomalib/engine/engine.py#L408

While current OTX Anomaly models inherit from Anomalib's models (like Padim), they also need these necessary callbacks to work. While these are present in the task, they are quite limited and cannot be configured.

training_extensions/src/otx/core/model/anomaly.py

Line 145 in 632f5c6

return [

The lightning model also stores model specific transforms that are used to configure the datamodule if required.
https://github.com/openvinotoolkit/anomalib/blob/d1f824a5798262891dbbe583fb291e1cf9aa7d2a/src/anomalib/engine/engine.py#L398

Hope this clarifies your question.

Add poc

6068851

Signed-off-by: Ashwin Vaidya <[email protected]>

ashwinvaidya17 requested review from samet-akcay, harimkang, vinnamkim, eugene123tw, kprokofi, chuneuny-emily, sovrasov, sungchul2, GalyaZalesskaya, negvet, goodsong81, yunchu, wonjuleee and eunwoosh as code owners July 19, 2024 10:12

github-actions bot added the OTX 2.0 label Jul 19, 2024

harimkang reviewed Jul 22, 2024

View reviewed changes

harimkang suggested changes Jul 22, 2024

View reviewed changes

wonjuleee reviewed Jul 23, 2024

View reviewed changes

ashwinvaidya17 marked this pull request as draft July 23, 2024 11:36

ashwinvaidya17 mentioned this pull request Jul 23, 2024

[RFC] Engine Refactor Proposal | Alternative 2 #3760

Draft

sovrasov removed the OTX 2.0 label Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Engine Refactor Proposal #3752

[RFC] Engine Refactor Proposal #3752

ashwinvaidya17 commented Jul 19, 2024 •

edited

Loading

harimkang left a comment

harimkang Jul 22, 2024

ashwinvaidya17 Jul 22, 2024

harimkang Jul 22, 2024

harimkang Jul 22, 2024

ashwinvaidya17 Jul 22, 2024

harimkang Jul 22, 2024

harimkang Jul 22, 2024

ashwinvaidya17 Jul 22, 2024

harimkang Jul 22, 2024

harimkang Jul 22, 2024 •

edited

Loading

harimkang Jul 22, 2024

harimkang Jul 22, 2024

harimkang left a comment

ashwinvaidya17 commented Jul 22, 2024

wonjuleee left a comment

wonjuleee left a comment

ashwinvaidya17 commented Jul 23, 2024

		from otx.engine.base import METRICS, Adapter


		def wrap_to_anomalib_datamodule(datamodule: OTXDataModule) -> AnomalibDataModule:

[RFC] Engine Refactor Proposal #3752

Are you sure you want to change the base?

[RFC] Engine Refactor Proposal #3752

Conversation

ashwinvaidya17 commented Jul 19, 2024 • edited Loading

TL;DR

Intro

Motivation

Approach

Outcome

API Example

CLI Registration

harimkang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harimkang Jul 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harimkang left a comment

Choose a reason for hiding this comment

ashwinvaidya17 commented Jul 22, 2024

wonjuleee left a comment

Choose a reason for hiding this comment

wonjuleee left a comment

Choose a reason for hiding this comment

ashwinvaidya17 commented Jul 23, 2024

ashwinvaidya17 commented Jul 19, 2024 •

edited

Loading

harimkang Jul 22, 2024 •

edited

Loading